# Efficient Deployment
Orpheus 3b 0.1 Ft Q4 K M GGUF
Apache-2.0
GGUF quantized version of Orpheus-3B-0.1-FT, suitable for efficient inference
Large Language Model English
O
freddyaboulton
30
1
Deepseek R1 Medical COT GGUF
Apache-2.0
DeepSeek-R1-Medical-COT is a Chain-of-Thought reasoning model specialized in the medical field, offering multiple quantized versions to accommodate different hardware requirements.
Large Language Model English
D
tensorblock
180
1
Qwen2.5 VL 7B Instruct FP8 Dynamic
Apache-2.0
The FP8 quantized version of Qwen2.5-VL-7B-Instruct, supporting efficient vision-text inference through vLLM
Text-to-Image
Transformers English

Q
RedHatAI
25.18k
1
Deepseek R1 Distill Llama 70B FP8 Dynamic
MIT
The FP8 quantized version of DeepSeek-R1-Distill-Llama-70B, which optimizes inference performance by reducing the number of bits of weights and activations.
Large Language Model
Transformers

D
RedHatAI
45.77k
9
Molmo 7B D 0924 NF4
Apache-2.0
The 4Bit quantized version of Molmo-7B-D-0924, which reduces VRAM usage through the NF4 quantization strategy and is suitable for environments with limited VRAM.
Image-to-Text
Transformers

M
Scoolar
1,259
1
Pixtral 12b FP8 Dynamic
Apache-2.0
pixtral-12b-FP8-dynamic is a quantized version of mistral-community/pixtral-12b. By quantizing weights and activations to the FP8 data type, it reduces disk size and GPU memory requirements by approximately 50%. It is suitable for commercial and research purposes in multiple languages.
Text-to-Image
Safetensors Supports Multiple Languages
P
RedHatAI
87.31k
9
QQQ Llama 3 8b G128
MIT
This is a version of the Llama-3-8b model quantized to INT4, using the QQQ quantization technique with a group size of 128 and optimized for hardware.
Large Language Model
Transformers

Q
HandH1998
1,708
2
Featured Recommended AI Models